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Marine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their 
functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen 
minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation 
to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in 0 2 -deficient waters of 
the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with 
seasonal 0 2 gradients and periodic sulfide accumulation. Phylogenetic analysis of small subunit 
ribosomal RNA (16S rRNA) gene clone libraries recovered five previously described MGA subgroups 
and defined three novel subgroups (SHBH1141, SHBH391, and SHAN400) in SI. To discern the functional 
properties of MGA residing along gradients of 0 2 in the NESAP and SI, we identified and sequenced to 
completion 14 fosmids harboring MGA-associated 16S RNA genes from a collection of 46 fosmid 
libraries sourced from NESAP and SI waters. Comparative analysis of these fosmids, in addition to four 
publicly available MGA-associated large-insert DNA fragments from Hawaii Ocean Time-series and 
Monterey Bay, revealed widespread genomic differentiation proximal to the ribosomal RNA operon that 
did not consistently reflect subgroup partitioning patterns observed in 16S rRNA gene clone libraries. 
Predicted protein-coding genes associated with adaptation to 0 2 -deficiency and sulfur-based energy 
metabolism were detected on multiple fosmids, including polysulfide reductase (psrABC), implicated in 
dissimilatory polysulfide reduction to hydrogen sulfide and dissimilatory sulfur oxidation. These results 
posit a potential role for specific MGA subgroups in the marine sulfur cycle. 
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Introduction 

Marine Group A (MGA) bacteria were first identified in 
small subunit ribosomal RNA (16S rRNA) gene clone 
libraries generated from surface waters of the Atlantic 
and Pacific Oceans (Fuhrman et al., 1993; Gordon and 
Giovannoni, 1996; Fuhrman and Davis, 1997). MGA, 
originally referred to as the 'SAR406 gene lineage', 
represents a deeply branching lineage of bacteria 
related to the genus Fibrobacter and the green sulfur 
bacterial (GSB) phylum, which includes the genus 
Chlorobium (Gordon and Giovannoni, 1996). To date, 
MGA remains a candidate phylum with no cultured 
representatives. Modern phylogenetic analyses indi- 
cate that the closest cultivated relatives of MGA are 
Caldithrix abyssi and Caldithrix palaeochoryensis, 
both belonging to the phylum Caldithrix. These 
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isolates are anaerobic, mixotrophic, thermophiles 
obtained from hydrothermal vent and sediment envir- 
onments, respectively (Miroshnichenko et al, 2003, 
2010). Although ubiquitous in the dark ocean, MGA 
are most prevalent and diverse in interior regions of the 
ocean with distinct oxyclines, such as in oxygen 
minimum zones (OMZs) and permanent or seasonally 
stratified anoxic basins (Madrid et al., 2001; Fuchs 
et al., 2005; Stevens and Ulloa, 2008; Schattenhofer 
et al., 2009; Zaikova et al., 2010; Allers et al., 2012; 
Wright et al, 2012). At present, the metabolic capacity 
and ecological roles of MGA in OMZs or in the ocean 
at large remain entirely unknown. Given that OMZs are 
expanding and intensifying (Emerson et al., 2004; 
Whitney et al., 2007; Bograd et al., 2008; Stramma 
et al, 2008; Helm et al., 2011), primarily as a result of 
global climate change (Keeling et al, 2010), it is of 
increasing importance to define the metabolic diversity 
and ecosystem function of dominant microorganisms 
within these systems in order to predict the systemic 
impacts of OMZ expansion on ocean ecology and 
biogeochemistry. 

The distribution of certain MGA subgroups was 
reported as being negatively correlated with the 
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concentration of dissolved oxygen (0 2 ) in the OMZ 
of the Northeast subarctic Pacific Ocean (NESAP), 
suggesting a potential role for 0 2 as a driver of MGA 
habitat selection and metabolic adaption in these 
waters (Allers et al., 2012). Determining the extent to 
which 16S rRNA-based patterns of MGA distribu- 
tion represent ecological types (ecotypes) differen- 
tiating in response to selective environmental 
pressures such as 0 2 deficiency requires genome- 
scale sequence data associated with multiple MGA 
subgroups to query for changes in genome composi- 
tion that might promote differential fitness across 
the oxycline. Here, we explore potential niche 
partitioning among and within MGA subgroups 
along oxic to anoxic-sulfidic gradients of dissolved 
0 2 in the North Pacific Ocean. In the absence of 
reference genomes representative of the MGA 
candidate phylum, we use phylogenetic anchor 
screening to identify 18 large-insert DNA fragments 
affiliated with various MGA subgroups as a direct 
route to studying MGA function. We describe and 
compare the genetic content and organization of 
these large-insert DNA fragments to gain preliminary 
insights into MGA metabolism. 



Materials and methods 

Sample collection and processing in the NESAP 
Sampling in the NESAP was conducted via multiple 
hydrocasts using a Conductivity, Temperature, 
Depth (CTD) rosette water sampler aboard the CCGS 
John P. Tully during Line P cruises 2009-09 June 
2009 (major stations: P4 (48°39.0N, 126°4.0W) - 7 
June, P12 (48°58.2N, 130°40.0W) - 9 June and P26 
(50°N, 145°W) - 14 June; 2009-10 August 2009 
(major stations P4 - 21 August, P12 - 23 August, 
P26 - 27 August; and 2010-01 February 2010 (major 
stations: P4 - 4 February, Pi 2 - 11 February) 
(Supplementary Figure Si). At these stations, large 
volume (201) samples for DNA isolation were 
collected from the surface (10 m), whereas 1201 
samples were taken from three depths spanning the 
OMZ core and upper and deep oxyclines (500 m, 
1000 m and 1300 m at station P4; 500 m, 1000 m and 
2000 m at station P12). Sampling at Saanich Inlet (SI) 
station S3 (48°35.30N, 123°30.22W) was performed as 
previously described (Zaikova et al, 2010) as part of a 
monthly monitoring program aboard the MSV John 
Strickland. Sample collection and filtration protocols 
can be viewed as visualized experiments at http:// 
www.jove.com/video/1159/ (Zaikova et al., 2009) and 
http://www.jove.com/video/1161/ (Walsh et al., 2009), 
respectively. 



Environmental DNA extraction 

DNA was extracted from sterivex filters as described 
by Zaikova and colleagues (Zaikova et al., 2010) and 
DeLong and colleagues (DeLong et al., 2006). The 
DNA extraction protocol can be viewed as a 



visualized experiment at http://www.jove.com/ 
video/1352/ (Wright et al, 2009). 



Phylogenetic analysis and tree construction using MGA 
16S rRNA gene sequences 

Phylogenetic analysis and tree construction using 
full-length 16S rRNA gene clone sequences from 
the NESAP and SI and 16S rRNA gene sequences 
identified on large-insert DNA fragments was per- 
formed as reported previously (Allers et al., 2012); 
see Supplementary Methods for details. 

Fosmid library construction and end sequencing 
Thirty fosmid libraries (~7680 clones/libraries) 
were constructed from DNA samples collected 
from Line P stations P4, P12, and P26 in June 
and August of 2009, and stations P4 and P12 
during February 2010 (Supplementary Table Si, 
Supplementary Figure Si). An additional 16 fosmid 
libraries were constructed from DNA samples 
collected from SI station S3 during the 2006-2007 
seasonal stratification and deep-water renewal cycle 
(Supplementary Table Si, Supplementary Figure Si) 
(Walsh et al., 2009). Further details on fosmid 
library construction and sequencing can be found 
in Supplementary Methods. 

Fosmid library screening, preparation, and full-length 
sequencing 

Twenty three of the 46 fosmid end sequenced 
libraries described above including 7 from Line P 
and 16 from SI were screened for the presence of 
16S rRNA genes using the NAST aligner 
(DeSantis et al, 2006a) and BLAST using default 
parameters against the 2008 Greengenes database 
(DeSantis et al., 2006) (Supplementary Figure Si). 
After preliminary phylogenetic analyses, 14 fosmid 
clones containing MGA-affiliated 16S rRNA genes 
were selected for complete sequencing (8 fosmids 
from Line P libraries and 6 fosmids from SI libraries; 
Table 1, Supplementary Figure Si). For sequencing 
protocols, see Supplementary Methods. 

GC content and oligonucleotide frequency analysis 
GC content of large-insert DNA fragments (14 fosmids 
from NESAP and SI in addition to four large-insert 
DNA fragments from other North Pacific Ocean 
environments; Table 1) was calculated using 
gccontent.pl with default parameters, available for 
download at https://github.com/hallamlab/utilities. 
Tetranucleotide frequencies were calculated as 
normalized Z-scores using TETRA (Teeling et al., 
2004a,b; http://www.megx.net/tetra). Principal com- 
ponent analysis was performed on normalized 
Z-score profiles for each insert using PRIMER 
V6.1.13 (Clarke, 1993; Clarke and Gorley, 2006). 
Principal component analysis was overlaid 
with clusters determined by Hierarchical Cluster 
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Analysis of normalized Z-scores using a Euclidean 
distance matrix (also performed in PRIMER). 

Global nucleotide similarity analysis 
Global nucleotide similarity in large-insert DNA 
fragments was determined by performing pairwise 
blastn comparisons between all fragments using 
onecircos.pl with default settings for all parameters 
except percent_identity (-p), which was calculated 
at 50%, 80%, 90% and 95% in separate analyses. 
onecircos.pl is available for download at: https:// 
github.com/hallamlab/utilities and is based on 
Circos (http://circos.ca/; Krzywinski et al., 2009). 

Open reading frame prediction and gene annotation 
Open reading frames (ORFS) were predicted and 
annotated using the in-house MetaPathways pipe- 
line (Konwar et al., 2013), available for download 
at: http://hallam. microbiology. ubc.ca/MetaPathways/. 
Briefly, primary nucleotide sequences from large- 
insert DNA fragments were quality controlled for 
ambiguous bases and file-format errors. ORFs were 
predicted using Prodigal (Hyatt et al, 2010). ORFs 
shorter than 60 amino acids in length were removed 
and were annotated using Protein BLAST (Altschul 
etal, 1990) (bit-score ratio >0.4 (Rasko et al, 2005), 
e- value = le — 5) against the RefSeq (Pruitt and 
Maglott, 2001), KEGG (Kanehisa and Goto, 1999), 
COG (Tatusov et al, 2001 ) and MetaCyc (Karp et al, 
2000) databases. Annotations were assigned to 
predicted ORFs based on the following four criteria: 
(i) the BLAST hit with top e-value was selected from 
each database; (ii) each BLAST hit was assigned an 
'information score' based on the sum of distinct and 
shared enzymatic words (prepositions, articles and 
auxiliary verbs were removed) with a preference for 
Enzyme Commission numbers ( + 10 score); (iii) the 
annotation with the highest score was selected and 
assigned to the respective ORF; (iv) ORFs with no 
hits were assigned the annotation 'hypothetical 
protein'. 

Amino acid similarity analysis 

Predicted amino acid similarity of large-insert 
DNA fragments was plotted in Trebol (available 
for download at: http://bioinf.udec.cl/trebol) using 
tblastx with a minimum bit-score cutoff of 50. COG 
categories present on large-insert fragments were 
plotted using tblastn of COG proteins against large- 
insert DNA fragments with a minimum e-value 
cutoff of le-4. 



Fragment recruitment of fosmid end sequences 
Coverage plots relating fosmid end sequences from 
individual NESAP and SI fosmid end libraries to 
large-insert DNA fragments were generated using 
the Nucmer program implemented in MUMmer 3.23 
(Kurtz et al, 2004) as cited in (Hallam et al, 2006). 



Further details on fragment recruitment can be 
found in Supplementary Methods. 

Phylogenetic analysis of PsrABC 
Protein sequences (including predicted protein 
sequences for PsrA, PsrB, and PsrC identified on 
fosmids FPPP_13C3 and 122006-105) were aligned 
using MUSCLE v3.6 with default parameters (Edgar, 
2004). For the purposes of this analysis, the PsrBC 
fusion proteins encoded by psrBC on fosmid FPPP_ 
13C3 and on certain reference sequences were 
divided into PsrB and PsrC subunits and analyzed 
in separate trees. Phylogenetic analyses were per- 
formed using PHYML (Guindon et al, 2005) using 
a WAG model of amino-acid substitution, where the 
parameter of the G distribution and the proportion 
of invariable sites were estimated for each data set. 
The confidence of each node was determined by 
assembling a consensus tree of 100 bootstrap replicates. 
The presence of TAT signal sequences on PsrA proteins 
was predicted using TatP 1.0 (Bendtsen et al, 2005), 
available at: http://www.cbs.dtu.dk/services/TatP/. 

Results 

Physiochemical characteristics of the NESAP and SI 
This study was conducted along the Line P transect 
of the NESAP (Supplementary Figure Si), beginning 
in SI, Vancouver Island, British Columbia 
(SI, Station S3: 48°58 / N, 123°50'W) and ending 
at Ocean Station Papa (also referred to as 
station P26: 50°N, 145°W) (Freeland, 2007). Owing 
to strong stratification and sluggish circulation 
of the interior NESAP waters, a large region of 
0 2 -deficient ( < 90 jimol kg ~ a ) water containing dysoxic 
(20-90 |Limol kg" 1 ) and suboxic (1-20 iimolkg" 1 ) 
compartments spans from ~400m to 2000 m 
in depth resulting in a persistent OMZ 
(0 2 <20|umolkg- 1 ). The OMZ is centered at 
1000 m, wherein dissolved 0 2 concentrations typically 
drop to ^Oiimolkg- 1 (Whitney et al, 2007). During 
the past 50 years of oceanographic observation, 
0 2 concentrations in the OMZ of coastal to open-ocean 
regions of the NESAP have not been observed to 
reach anoxic ( < 1 jimol kg ~ a ) levels. However, interior 
and basin waters of SI typically experience seasonal 
periods of anoxia and sulfide accumulation on an 
annually recurring basis (Anderson and Devol, 
1973; Lilley et al, 1982; Ward et al, 1989). 
Physicochemical data from basin (S3), coastal (P4), 
transition (P12) and open-ocean (P26) stations 
measured along the Line P transect relevant to 
the present study are provided in Table 1 and 
Supplementary Table Si. 

Taxonomic diversity of MGA in the NESAP and SI 
To identify 16S rRNA genes affiliated with MGA 
inhabiting SI waters, we screened 19 previously 
published bacterial 16S rRNA gene clone libraries 
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(containing a total of 6645 sequences) generated 
from samples traversing the water column during 
the 2006-2007 seasonal stratification and deep-water 
renewal cycle and during the spring stratification in 
2008 at Station S3 (Supplementary Table Si; Walsh 
and Hallam, 2011). A total of 415 16S rRNA gene 
sequences affiliated with MGA were recovered from 
SI clone libraries. These sequences were added to a 
data set containing 290 MGA 16S rRNA sequences 
previously reported from Line P stations P4, Pi 2, 
and P26 (Allers et al, 2012) and clustered at 97% 
identity, forming 156 distinct operational taxonomic 
units (OTUs), 120 of which contained only single- 
tons. Representative sequences were obtained for each 
non-singleton OTU and placed in phylogenetic con- 
text with relevant reference sequences from other 
locations (Supplementary Figure S2). Five out of 10 
previously defined MGA subgroups were recovered in 
SI clone libraries (ZA3648c and ZA3312c (Fuchs, 
unpublished); Arctic96B-7 (Bano and Hollibaugh, 
2002); SAR406 (Gordon and Giovannoni, 1996); and 
A714018 (Allers et al, 2012), and three novel 
subgroups were identified (SHBH1141, SHBH391, 
and SHAN400) (Supplementary Figure S2). These 
novel subgroups were found exclusively in SI and 
contained the most abundant OTUs identified in this 
location (Supplementary Figures S2, S3). 

As described by Allers and colleagues (Allers 
et al., 2012), MGA sequences identified in coastal 
and open ocean waters of the NESAP comprised 
0.7 + 0.84% of 10m clone libraries and 11.2 + 3.9% 
of clone libraries from 0 2 -deficient waters, with a 
maximum of 16.4% at P26 1000 m. The most 
abundant MGA OTUs present in these locations 
comprised between 1% and 4% of clone libraries 
and belonged to subgroups Arctic95A-2, ZA3312c, 
Arctic96B-7, SAR406, and HF770D10, in order of 
decreasing OTU abundance (Supplementary Figures 
S2, S3). In comparison, MGA OTUs identified in SI 
comprised 1.6 + 0.81% of 10 m clone libraries and 
7.1 + 3.6% of clone libraries from 0 2 -deficient 
waters. The most abundant OTUs present in SI 
comprised between 1% and 5% of clone libraries, 
and belonged to subgroups SHBH391, SHAN400, 
SHBH1141, ZA3312c, SAR406 and Arctic96B-7, in 
order of decreasing OTU abundance (Supplementary 
Figures S2, S3). 



Characterization and phylogenetic assignment of large- 
insert DNA fragments 

To connect 16S rRNA-based patterns of distribution 
across the oxycline in the NESAP and SI to genomic 
information associated with specific MGA subgroups, 
we screened 23 end sequenced fosmid libraries for the 
presence of clones containing 16S rRNA gene 
sequences (Supplementary Figure Si). Collectively, 
fosmid end libraries contained a total of 164 736 
genomic clones representing 255.3 Mb of environmen- 
tal genomic DNA (Supplementary Table Si). Screening 
of fosmid end sequences for 16S rRNA genes 
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uncovered 14 fosmid inserts containing partial or 
full-length 16S rRNA gene sequences affiliated with 
MGA (Table 1; Supplementary Figure Si). These 14 
fosmid inserts were fully sequenced (Materials and 
methods) for downstream analyses, generating 
~540kb of DNA sequence linked to MGA. In 
addition, four large-insert DNA fragments from 
Hawaii Ocean Time-series Station ALOHA 
(DeLong et al, 2006; Rich et al, 2011) and Monterey 
Bay (Suzuki et al, 2004) harboring MGA 16S rRNA 
gene sequences were identified in public databases 
and used in comparative analyses (Table 1; 
Supplementary Figure Si). 

To identify subgroup affiliations, all 18 MGA 16S 
rRNA gene sequences identified on large-insert 
fragments from North Pacific Ocean environments 
were placed into the MGA reference tree described 
above (Supplementary Figure S2). Seventeen out 
of 18 16S rRNA gene sequences identified on large- 
inserts grouped with 10 defined MGA subgroups. 
The remaining 16S rRNA gene (on fosmid 4050020-J15) 
appeared to group outside of MGA and was most closely 
affiliated with sequences in the phylum Deferribacteres. 
We chose to include this fosmid in downstream 
analyses to represent a close relative of MGA. 



Genomic content and organization of large-insert DNA 
fragments derived from MGA 

Four criteria were used to determine the extent 
to which large-insert DNA fragments partitioned 
into groups consistent with shared environmental 
context or phylogenetic association, including GC 
content, tetranucleotide frequency, global nucleo- 
tide similarity, and amino acid similarity of pre- 
dicted ORFs. The size of the large-insert fragments 
containing MGA 16S rRNA genes ranged from 
27.4 kb to 43.5 kb with a GC content ranging from 
32.8% to 47.7% (Table 1). Large-insert fragments 
did not differentiate into discrete groups based on 
similar GC content (Table 1) or tetranucleotide 
frequency (Supplementary Figure S4). To further 
investigate potential similarities among fragments 
associated with nucleotide arrangement, pairwise 
blastn analyses were performed between all 
fragments (Figure 1). Bit-scores for pairwise blastn 
analyses ranged between 0 and 4.5 x 10 4 for non- 
identical fragments. Large-insert fragments from 
Monterey Bay (EBAC750-03B02) and the NESAP 
(1250012-L08 and 4130011-107), affiliated with 
subgroup Arctic95A-2, were most similar to one 
another and formed a distinct group based on global 
nucleotide similarity (Figure 1). The remaining 
inserts did not form distinct groups based on global 
nucleotide similarity, but displayed a gradient of 
similarity, with bit scores for pairwise blastn analyses 
averaging (2.2 + 1.5) x 10 3 . Fosmid 122006-105, 
affiliated with subgroup P262000D03, was most 
unique at the nucleotide level. 

To investigate potential similarities among large- 
insert fragments at the protein-coding level, ORFs 
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were predicted and annotated (Materials and methods). 
The number of predicted ORFs per insert ranged 
from 14 to 49, and the number of ORFs on each 
fragment annotated as 'hypothetical protein' ranged 
from 11 to 39 (51-92% of ORFs per insert) (Table 1). 
Four groups with shared but not identical amino- 
acid sequences of predicted ORFs surrounding the 
16S rRNA gene were identified (groups I— IV), 
whereas the De/erribacteres-affiliated fosmid 
(4050020-J15) did not show significant similarity 
to any other fragments at the protein-coding level 
and was placed in its own group (group V) 
(Figure 2). These groups did not uniformly correlate 
with shared environmental origin or 16S rRNA 
sequence identity at the level of defined subgroups 
(Table 1, Supplementary Figure S2). In some cases it 
was clear that fosmid groups represented different 
flanking regions of the rRNA operon (that is, groups 
I and II; Figure 2, Supplementary Figure S5). 

Four out of eight fosmids in group I were 
affiliated with the Arctic96B-7 subgroup, whereas 
the remaining four fosmids were affiliated with 
ZA3312c, SHBH391, SAR406 and SHAN400. Fosmids 
in group I contained a conserved gene cluster with 



genes encoding glucosamine-fructose-6-phosphate 
aminotransferase (involved in glucosamine bio- 
synthesis), GMP synthase (involved in purine 
nucleotide biosynthesis) and acetyl-coenzyme 
A carboxylase carboxyl transferase subunits alpha 
and beta (potentially involved in fatty acid biosynth- 
esis or C0 2 fixation) (Figure 2, Supplementary 
Figure S5). SI fosmid FPPZ_5C6 also contained a 
gene encoding RNA polymerase sigma-70 factor 
[rpoE), known to have a role in high temperature 
and oxidative stress response (Hild et aL, 2000). 
Fosmid HF0010_18Ol3 contained the conserved 
cluster of genes found in group I fosmids as well 
as a cluster of cytochrome c oxidase subunit genes 
present in Fe(II) oxidation and three pentose 
phosphate pathway genes also found in group III 
fosmids (ribulose-phosphate 3-epimerase, ribose- 
5-phosphate isomerase b and transketolase). 

Group II fosmids were affiliated with Arctic96B-7 
and P262000D03. Both fosmids in this group 
(FPPP_13C3 and 122006-105) contained a cluster 
of genes encoding enzymes involved in the pen- 
tose phosphate pathway of carbon metabolism, 
including ribulose-phosphate 3-epimerase, ribose- 
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5-phosphate isomerase b, and in one case, transke- 
tolase. Both fosmids also contained an operon 
encoding an enzyme complex related to polysulfide 
reductase (Psr). The operon on fosmid 122006-105 
contained three genes encoding homologs of the 
three Psr subunits: Psr A, a molybdopterin oxidor- 
eductase; PsrB, a [4Fe-4S] -binding subunit; and 
PsrC, a membrane anchor subunit carrying the site 
of quinol oxidation, whereas the operon on fosmid 
FPPP_13C3 contained two genes encoding Psr A and 
a PsrBC fusion protein. Fosmid FPPP_13C3 con- 
tained additional neighboring genes encoding 
molybdenum cofactor and molybdopterin biosynth- 
esis proteins potentially associated with the assem- 
bly of the molybdenum and molybdopterin guanine 
dinucleotide-containing subunit PsrA. Fosmid 
FPPP_13C3 also contained a gene for glutamate 
synthase, often involved in nitrogen assimilation 
(Vanoni and Curti, 2008), and a gene for rubrery- 
thrin, involved in oxidative stress protection in 
some anaerobic bacteria and archaea (deMare et ah, 
1996; Sztukowska et ah, 2002). Fosmid 122006-105 
contained a gene encoding a rhodanese-like protein, 
belonging to a superfamily of sulfur transferases 
(Cipollone et ah, 2007), upstream of the Psr operon. 



All three genomic inserts belonging to group III 
were affiliated with subgroup Arctic95A-2 and were 
derived from Monterey Bay and the NESAP (Table 1, 
Figure 2, Supplementary Figure S6). These three 
fosmids also formed a discrete group based on global 
nucleotide similarity analysis (Figure 1). The main 
organizational feature shared by these inserts was 
a set of genes encoding transporters, including 
an ABC-type multidrug transporter, ATPase com- 
ponent, ABC-2 permease and a Tonb-dependent 
receptor. Group III inserts also contained genes 
encoding succinyl-diaminopimelate desuccinylase, 
involved in lysine biosynthesis. Monterey Bay insert 
EBAC750-03B02 contained a gene affiliated with 
methionine sulfoxide reductase [msrB). In Escher- 
ichia coli, MsrB has been shown to have sulfoxide 
and dimethyl sulfoxide reductase activity (Grimaud 
et ah, 2001). This insert also contained a gene 
encoding a rhodanese-like protein. 

Fosmids in group IV were affiliated with sub- 
groups P262000N21, SAR406, and A714018, and 
primarily contained genes encoding hypothetical 
proteins except for two conserved genes encoding 
an ATP-dependent protease Clp ATPase subunit and 
protease subunit. Group IV fosmid HF4000_22B16 
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Figure 3 Dot plot showing the proportion of fosmid end sequenced libraries recruiting to MGA large-insert DNA fragments at various 
sample locations and depths in the NESAP (at stations P4, P12, and P26) and SI (station S3). Hollow circles represent proportion of 
fosmid end sequenced libraries recruiting to large-insert fragments with nucleotide similarity 60-80%; solid circles >80%. 



was assembled as two unordered pieces, as such, it 
contained a break point within the 23S rRNA gene 
(Supplementary Figure S6). 

The only fosmid in group V (4050020-J15; most 
closely related at the 16S rRNA gene sequence level 
to members of the phylum Deferribacteres) did not 
exhibit much protein similarity to any of the MGA- 
affiliated fosmids. This fosmid contained genes for 
NADH-ubiquinone and quinone oxidoreductase 
involved in energy metabolism, a major facilitator 
superfamily transporter, a dihydroorotate dehydro- 
genase, a cell wall associated hydrolase, and a tRNA 
nucleotidyltransferase, in addition to genes encod- 
ing a number of hypothetical proteins. 



these libraries (Supplementary Table Si). End 
sequences from NESAP libraries generally recruited 
to large-insert fragments in larger numbers and with 
a higher degree of nucleotide similarity than end 
sequences from SI libraries, even for large-insert 
fragments derived from SI (Figure 3). End sequences 
similar to group III fragments were most highly 
and consistently represented in NESAP fosmid end 
libraries, followed by end sequences similar to 
several group I fragments. End sequences similar to 
the Deferribacteres-like fosmid 4050020-J15 were 
also well represented and very similar to sequences 
derived from oxic through suboxic (but not anoxic) 
NESAP and SI libraries. 



Population structure of MGA syntenic groups 
To determine the prevalence and distribution of 
MGA subgroups represented by large-insert DNA 
fragments detected in this study, the proportion 
of fosmid end sequences from each NESAP and SI 
library recruiting to large-insert fragments was 
determined (Figure 3). The largest proportions of 
sequences recruiting to large-insert fragments were 
derived from depths ^500m in the NESAP and 
^100m in the SI. A very small proportion of end 
sequences were recruited from Aug-09 P26 libraries, 
which could be due to the relatively small size of 



Phylogenetic analysis and distribution of Psr 
To gain insight into the evolutionary history of psr A, 
psrB, and psrC genes detected on MGA fosmids, 
phylogenetic trees of their predicted protein pro- 
ducts were constructed. Phylogenetic analysis of the 
catalytic subunit, PsrA, confirmed that predicted 
PsrA homologs detected on MGA fosmids were most 
closely related to Psr and thiosulfate reductase (Phs) 
of the dimethyl sulfoxide reductase family of 
molybdenum-containing enzymes (Supplementary 
Figure S7). Predicted PsrA homologs from MGA 
fosmids were ~63% similar to one another, and 
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Figure 4 Unrooted phylogenetic trees based on protein sequences with homology to (a) predicted polysulfide reductase molybdopterin- 
containing subunit (PsrA); (b) predicted [4Fe-4S] -binding subunit (PsrB); and (c) membrane anchor subunit (PsrC) identified on fosmids 
FPPP_13C3 and 122006-105. The trees were inferred using maximum likelihood implemented in PhyML. Solid circle indicates proteins 
derived from organisms that have been demonstrated to grow by reducing elemental sulfur or polysulfide with concomitant H 2 S 
production; hollow circle indicates presence of a psrBC gene fusion. The scale bar represents estimated number of amino-acid 
substitutions per site. Bootstrap values below 50% are not shown. 



most closely related to proteins encoded on 
fosmids from the Mediterranean Sea and Monterey 
Bay derived from Marine Group II euryarchaeota 
(Figure 4a). Predicted MGA proteins were less 
similar to canonical PsrA proteins originally char- 
acterized in Wolinella succinogenes (Krafft et ah, 
1995). Phylogenetic trees of predicted PsrB with 
PsrB-like respiratory proteins containing [4Fe-4S]- 
binding-subunits and of predicted PsrC with PsrC- 
like membrane anchor subunits indicated similar 
phylogenetic relationships (Figures 4b and c). 
Predicted PsrA proteins encoded on MGA fosmids 
did not contain any obvious signal sequences (for 
example, twin-arginine translocation (TAT) signal 
sequences) suggesting that these proteins are located 
in the cytoplasm, similar to PsrA proteins detected 
in most green sulfur bacteria (GSB) (Frigaard and 
Bryant, 2008). PsrA encoded by W. succinogenes, by 
comparison, encodes a TAT signal sequence and is 
translocated into the periplasm (Krafft et al., 1995). 

In contrast to the organization of the psrABC 
operon originally described in W. succinogenes 
(Krafft et al, 1995), the ORFs encoding PsrB and 
PsrC homologs on both MGA fosmids were located 
upstream of ORFs encoding PsrA homologs 
(Figure 2). Also in contrast to the W. succinogenes 
psrABC operon, the genes encoding PsrB and PsrC 
on fosmid FPPP_13C3 appeared to form a gene 
fusion [psrBC), a feature also detected in several 
PSR-containing GSB (Frigaard and Bryant, 2008) 
and other PSR-containing bacteria and archaea. 



The psrBCA format of operon organization detected 
on MGA fosmids was also detected on Marine Group 
II fosmids and several PSR-containing GSB in 
addition to Sulfurimonas denitrificans DSM 1251, 
Caldilinea aerophila DSM 14535, Chloroflexus 
aggregans DSM 9485, and Haladaptatus paucihalo- 
philus DX253 (Figures 4b and c). A third format of 
operon organization [psrACB) was detected in 
Sulfurimonas gotlandica GDI and Sulfurihydrogen- 
ihium azorense Az-Ful. 

To determine the prevalence of predicted MGA psr 
genes in NESAP and SI fosmid end sequenced 
libraries, the proportion of fosmid end sequences 
that recruited to the psrBCA operon on fosmids 
FPPP_13C3 and 122006-105 was calculated for each 
end sequenced library (Supplementary Figure S8). 
The majority of end sequences recruiting to psr genes 
were derived from ^500m depth in the NESAP and 
^100m depth in SI, and psr homologs were most 
consistently present throughout 0 2 -deficient waters of 
the NESAP in August 2009 at station P4. 



Discussion 

The 17 large-insert DNA fragments containing MGA 
16S rRNA genes derived from North Pacific Ocean 
metagenomic libraries were affiliated with seven 
previously defined and two novel MGA subgroups, 
whereas the 16S rRNA gene on an 18th insert was 
more closely related to the phylum Deferribacteres 
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(Supplementary Figure S2). Although large-insert 
DNA fragments were obtained from multiple environ- 
ments manifesting distinct oxyclines, fragments did 
not coalesce into coherent groups based on GC 
content, tetranucleotide frequency, or global nucleo- 
tide similarity. However, fragments did coalesce into 
five syntenic groups based on shared amino acid 
similarity of predicted ORFs. Group membership 
was not generally consistent with shared environ- 
mental origin, 0 2 concentration, or 16S rRNA gene 
sequence identity (Table 1). These observations 
could be explained in several ways. MGA subgroups 
may contain multiple unlinked copies of the 16S 
rRNA operon (Acinas et al., 2004). Alternatively, 
large-insert fragments may be derived from flanking 
regions of the same 16S rRNA operon, as observed 
for syntenic groups I and II. It is also possible that 
subgroups ZA3312c through A714018 actually 
represent one subgroup of MGA, evidenced by a 
lack of bootstrap support for nodes encompassing 
these subgroups within the MGA 16S rRNA gene 
tree (Supplementary Figure S2). 

Recruitment of fosmid end sequences from 
Line P and SI libraries to large-insert DNA fragments 
reflected 16S rRNA-based patterns of MGA distribu- 
tion in that the proportion of MGA sequences was 
maximal in waters ^ 500 m depth in the NESAP and 
^100m depth in SI (Figure 3). MGA sequences 
comprised a much larger proportion of NESAP 
(open ocean) than SI (coastal basin) end libraries, a 
pattern also reflected in MGA 16S rRNA distribution 
and representative of the overall higher proportion 
of MGA detected in the NESAP than in SI microbial 
communities (Zaikova et ah, 2010; Allers et al., 
2012). The proportion of SI end sequences recruiting 
to large-insert fragments was maximal in dysoxic 
and suboxic samples from Nov 2006 and April 2007, 
supporting the hypothesis that dominant MGA 
subgroups are adapted to 0 2 -deficiency in this 
location. The largest proportion of end sequences 
from NESAP libraries recruited to group III fragments 
affiliated with subgroup Arctic95A-2 supporting 16S 
rRNA-based observations that Arctic95A-2 is a 
dominant subgroup in the NESAP open-ocean. 
Group I fragments affiliated with subgroups Arc- 
tic 96B- 7 and SAR406 also recruited a relatively large 
proportion of NESAP end sequences. A reasonable 
proportion of end sequences from the NESAP and SI 
libraries also recruited to Deferribacteres-like fosmid 
4050020-J15, with a pattern of distribution suggesting 
adaptation to suboxic and dysoxic, but not anoxic, 
conditions. 

Although large-insert fragments did not clearly 
partition into ecologically distinct groups based on 
0 2 concentration, predicted protein-coding genes 
associated with adaptation to 0 2 -deficiency and 
sulfur-based energy metabolism were detected on 
multiple fosmids. With respect to adaptation to 
0 2 -deficiency, a gene encoding rpoE RNA polymerase 
sigma-70 factor, known to have a role in oxidative 
stress response, was detected on SI fosmid 



FPPZ_5C6, obtained from an anoxic-sulfidic 200 m 
sample. A gene encoding rubrerythrin, also involved 
in oxidative stress response in some anaerobic 
prokaryotes, was detected on SI fosmid FPPP_13C3, 
obtained from an oxic 10 m sample. With respect to 
sulfur-based energy metabolism, a gene encoding 
methionine sulfoxide reductase (MsrB) was detected 
on Monterey Bay insert EBAC750-03B02, which in 
E. coli has been shown to have sulfoxide and 
dimethyl sulfoxide reductase activity (Grimaud 
et al., 2001). In addition, four fosmids encoded 
rhodanese-like proteins, affiliated with a super- 
family of sulfur transferases. Perhaps most interest- 
ingly, a psr operon was detected on SI fosmid 
FPPP_13C3 and on NESAP fosmid 122006-105, 
obtained from a dysoxic 2000 m sample. Sequences 
similar to psr genes encoded on these fosmids were 
also detected in a number of fosmid end sequenced 
libraries derived from ^500m depth in the NESAP 
and ^ 100 m depth in SI, suggesting that these genes 
are associated with 0 2 -deficient environments 
(Supplementary Figure S8). In the anaerobic epsi- 
lonproteobacterium W. succinogenes, PSR and 
hydrogenase or formate dehydrogenase allows 
respiration on polysulfide (SJ using H 2 or formate 
as an electron donor, with concomitant production 
of H 2 S (Jankielewicz et al., 1995). The PSR complex 
isolated from W. succinogenes has also been docu- 
mented to catalyze sulfide oxidation to polysulfide 
by dimethylnaphthoquinone, however, with much 
lower efficiency (Hedderich et al., 1999). The 
identification of proteins homologous to PSR on 
two fosmids suggests that specific MGA subgroups 
may have the capacity to generate energy via 
dissimilatory polysulfide reduction to hydrogen 
sulfide (H 2 S) or via dissimilatory H 2 S oxidation 
(Schroder et al, 1988; Klimmek et al, 1991; Krafft 
et al, 1992, 1995; Jormakka et al, 2008). 

The PSR complex of W. succinogenes is encoded 
by the psrABC genes and consists of two periplas- 
mic subunits (a catalytic molybdopterin-containing 
Psr A subunit and a [4Fe-4S] -binding PsrB subunit) 
and a membrane-anchoring PsrC subunit (Krafft 
et al., 1992). Predicted PsrA proteins detected on 
MGA fosmids were only distantly related to isolated 
PsrA from W. succinogenes but more closely related 
to PsrA homologs encoded on Marine Group II 
euryarchaeotal fosmids derived from the Mediterra- 
nean Sea and Monterey Bay. PsrA proteins detected 
on MGA fosmids were also similar to PsrA homologs 
found in the GSB Prostheticochloris aestuarii DSM 
271, Chlorobium chlorochromatii CaD3, Chlorobium 
luteolum DSM273, Chlorobium limicola DSM 
245, and Chlorobium phaeobacteroides DSM 266; 
the halophilic euryarchaeon Haladaptatus pauciha- 
lophilus DX253; the thermophilic Chloroflexi strain 
Caldilinea aerophila DSM 14535; the thermophilic 
Aquificales strain Sulfurihydrogenibium azorense 
Az-Ful; and the sulfur-oxidizing Epsilonpro- 
teobacteria Sulfurimonas gotlandica GDI and Sul- 
furimonas denitrificans DSM 1251. Interestingly, in 
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GSB, the phylogeny of PsrA homologs is congruent 
with a number of phylogenetic anchor genes, 
suggesting that PSR was present in the last common 
ancestor of PSR-containing GSB (Gregersen et al, 
2011). Given the proximal phylogenetic relationship 
of MGA and GSB based on 16S rRNA gene sequences 
(Supplementary Figure S2), it is possible that MGA 
inherited this operon from a common ancestor. The 
psrBC genes on MGA fosmid 122006-105 were 
encoded by separate ORFs [psrB and psrC), whereas 
in fosmid FPPP_13C3, these genes were fused 
[psrBC). A psrBC gene fusion has been described 
previously in members of the PSR-containing GSB 
(including P. aestuarii, C. chlorochromatii, and C. 
luteolum; (Frigaard and Bryant, 2008)), and was 
detected in Marine Group II fosmids from the 
Mediterranean Sea and Monterey Bay in addition to 
H. paucihalophilus and C. aerophila. The broad 
phylogenetic origins of psrABC genes similar to those 
detected on MGA fosmids are consistent with multi- 
ple lateral transfer events across phyla and domains. 

Although direct evidence for the role of PSR in 
sulfur-based energy metabolism has only been 
obtained from W. succino genes, many cultivated 
reference strains encoding PSR are capable of 
generating energy using sulfur compounds. The 
PSR sequences derived from several such reference 
strains, including S. azorense Az-Ful and the GSB, 
branched with predicted PSR homologs detected on 
MGA fosmids. S. azorense Az-Ful is capable of 
growth by coupling reduction of elemental sulfur 
(S°) to hydrogen oxidation, although polysulfide 
was not directly tested as an electron acceptor 
(Aguiar et al., 2004). S. azorense Az-Ful has also 
been documented to oxidize S° and sulfite (SO! - ) 
(Aguiar et al, 2004). Similarly, the cytoplasmic PSR 
complex found in many GSB (including P. aestuarii, 
C. chlorochromatii, C. luteolum, C. limicola and 
C. phaeohacteroides) has been proposed to oxidize 
sulfite produced by the dissimilatory sulfate reduc- 
tion (Dsr) system (Gregersen et al., 2011). Although 
the actual substrate of PSR cannot be determined 
based on sequence similarity alone, the phyloge- 
netic position of MGA PSR homologs provides a 
circumstantial link between MGA and sulfur cycling 
in the environment. 

Oxygen-deficient marine systems, including 
OMZs and permanent or seasonally stratified anoxic 
basins, are known to harbor active sulfur cycles that 
have been linked to the activities of sulfur-oxidizing 
gamma and epsilonproteobacteria (Walsh et al., 
2009; Canfield et al, 2010; Grote et al, 2012). 
Although this study provides only a glimpse into the 
metabolic diversity that is likely contained within 
the MGA candidate phylum, the presence of PSR 
homologs on MGA-affiliated genome fragments 
suggests a potential role for MGA in the cryptic 
sulfur cycle of 0 2 -deficient marine systems, where 
the abundance of these bacteria is concentrated. 
Process rate measurements linking sulfur chemistry 
with MGA activity are required to support this 
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hypothesis (Milucka et al, 2012). Given the lack of 
cultivated representatives of MGA, the application of 
single-cell genomics could aid in providing the 
genome-wide information needed to fully describe 
the metabolic capacity of defined MGA subgroups 
residing in distinct locations (Woyke et al, 2009; 
Swan et al, 2011; Stepanauskas, 2012). Such high- 
resolution genomic data may provide additional 
clues as to the evolutionary history and biogeochem- 
ical roles of these widely distributed marine bacteria. 
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